Automatic Dialect Identification: A Study of British English

نویسندگان

  • Emmanuel Ferragne
  • François Pellegrino
چکیده

This contribution deals with the automatic identification of the dialects of the British Isles. Several methods based on the linguistic study of dialect-specific vowel systems are proposed and compared using the Accents of the British Isles (ABI) corpus. The first method examines differences in diphthongization for the face lexical set. Discrimination scores in a two-dialect discrimination task range from chance to ca. 98% of correct decision depending on the pair of dialects under test. Thanks to the ACCDIST method (developed in [1]), the second and third experiments take dialectal differences in the structure of vowel systems into consideration; evaluation is performed on a 13-dialect closed set identification task. Correct identification reaches up to 90% with two subsets of the ABI corpus (/hVd/ set and read passages). All these experiments rely on a front-end automatic phonetic alignment and are therefore textdependent. Results and possible improvements are discussed in the light of British dialectology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dialect analysis and modeling for automatic classification

In this paper, we present our recent work in the analysis and modeling of speech under dialect. Dialect and accent significantly influence automatic speech recognition performance, and therefore it is critical to detect and classify non-native speech. In this study, we consider three areas that include: (i) prosodic structure (normalized f0, syllable rate, and sentence duration), (ii) phoneme a...

متن کامل

You had me at "Hello": Rapid extraction of dialect information from spoken words

Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course ...

متن کامل

Rhythm in read british English: interdialect variability

Duration features have been thought to be the most obvious correlates of speech rhythm. Previous studies have shown that they can be used to distinguish among some world's languages. This paper investigates to what extent the methods employed in these studies can be applied to the dialects of British English. We have tested whether a set of variables derived from automatically extracted duratio...

متن کامل

Classifying English Documents by National Dialect

We investigate national dialect identification, the task of classifying English documents according to their country of origin. We use corpora of known national origin as a proxy for national dialect. In order to identify general (as opposed to corpus-specific) characteristics of national dialects of English, we make use of a variety of corpora of different sources, with inter-corpus variation ...

متن کامل

Word-Based Dialect Identification with Georeferenced Rules

We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007